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Max-stable processes are the natural analogues of the generalized extreme-value distribution for 
1—5 the modelling of extreme events in space and time. Under suitable conditions, these processes are 

asymptotically justified models for maxima of independent replications of random fields, and they 

are also suitable for the modelling of joint individual extreme measurements over high thresholds. 

This paper extends a model of Schlather (2002) to the space-time framework, and shows how a 

^ pairwise censored likelihood can be used for consistent estimation under mild mixing conditions. 

^ Estimator efficiency is also assessed and the choice of pairs to be included in the pairwise likelihood 
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, cxa, is discussed based on computations for simple time series models. The ideas are illustrated by an 
application to hourly precipitation data over Switzerland. 
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1 Introduction 



Under suitable conditions, max-stable processes are asymptotically justified models for maxima 
of independent replications of random fields. Since they extend the generalized extreme-value 
distribution of univariate extreme value theory to the functional setting, they thus appear to be 
natural models for spatial extremes, de Haan's (1984) spectral representation of such processes 
implies that there are infinitely many max-stable processes, and in practice the challenge is to 
build flexible but parsimonious models that can capture a wide range of extremal dependencies. 
Parsimony is important since extremal data are often scarce, but flexibility is also crucial since 
a poor fit might lead to mis-estimation of the risk. Several models for max-stable processes 
have been proposed: Smith (1990) proposes a max-stable model with deterministic storm shapes, 
and Schlather (2002) proposes a model based on a Gaussian process. Other models include the 
Brown-Resnick processes (see Kabluchko and Schlather, 2010), or a Brownian motion model 
proposed by Buishand et al. (2008), which has the drawback of not being invariant with respect 
to coordinate axes. Wadsworth and Tawn (2012) generalize these max-stable models to hybrid 
spatial dependence models able to capture and handle both asymptotic dependence and asymptotic 
independence. Reich and Shaby (2011) propose a finite-dimensional construction of max-stable 
processes that can be fitted in the Bayesian framework with Markov chain Monte Carlo methods. 
Other modelling approaches for spatial extremes, based either on copula or on latent processes, 
are presented by Davison et al. (2012). 

The full likelihood cannot be obtained analytically for most max-stable processes (but see 
Genton et al., 2011). However, since the bivariate marginal densities can usually be derived, 
inference can be based on a composite likelihood. Much has been written on pseudo-, quasi- 
or composite-likelihood: see for example Hjort and Varin (2008), Lindsay (1988), Varin (2008), 
Varin and Vidoni (2005), Cox and Reid (2004), or Padoan et al. (2010) and Davison and Gholam- 
rezaee (2012) for applications to spatial extremes. Such likelihoods are robust to misspecification 
of higher distributional assumptions and have nice theoretical properties, but so far have been 
applied only to componentwise maxima. An important extension, which improves inference by 
incorporating more information, is to perform pairwise threshold-based inference for max-stable 
processes, analogous to the use of the generalized Pareto distribution in the univariate case. This 
will be addressed in this article. 

In Section 2, we tie together geostatistics and statistics of extremes to construct asymptotically 
valid space-time models for extremes. The spatio-temporal aspect of this modelling is novel, 
though related work include Davis and Mikosch (2008) and Davis et al. (2011). Section 3 is focused 
on inference and describes the methods based on pairwise likelihood, while Section 4 addresses the 
loss in efficiency of the estimation procedure compared to classical maximum likelihood estimation 
and gives some suggestions about the choice of pairs to be included in the pairwise likelihood. 
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Section 5 describes simulations to validate our approach, and Section 6 describes its application 
to space-time modelling of rainfall. Some concluding discussion is given in Section 7. 



2 Threshold modelling for extremes 
2.1 Marginal modelling 

The classical theory of extreme values addresses the large-sample fluctuations of the maximum 
M n of a sequence of independent and identically distributed random variables X 1} . . . ,X n whose 
distribution F has upper terminal Xf = sup{x : F(x) < 1} G lRU{+oo}. If sequences {a n } > and 
{b n } C R exist such that (M n — b n )/a n converges in distribution to a non-degenerate distribution 
G, then this must necessarily be the generalized extreme-value (GEV) distribution, that is G(x) = 
exp[-{l+£(a;-?7)/V}~ 1/? ], defined on the set l+£(x— rf)/r > 0, with 77 e R, r > 0,£ G 1R and with 
the value for £ = being interpreted as £ — > 0. A complementary result describes the stochastic 
behaviour of peaks over a high threshold u: if this limiting result holds for maxima, then as 
u — > xf the conditional distribution of X — u, given that X > u, converges to the generalized 
Pareto distribution, GPD(<r, £) (Davison and Smith, 1990). The distribution of such a variate is 



where the scale parameter is linked to that of the GEV distribution by a = t + — rj). A closely 
related characterization of extremes relies on point processes. If the limiting result holds for 
maxima, then as n — » oo the two-dimensional point process {i/(n+ 1), (Xj — b n ) / a n }™ =1 converges 
to a non-homogenous Poisson process on regions of the form [t%, tj[ X [u, oo), < t\ < ti < 1, with a 
certain intensity (see Leadbetter, 1991; Smith, 1989). In practice, the data often exhibit temporal 
dependence, and the aforementioned asymptotic results can be extended to stationary sequences 
with short-range dependence (see Leadbetter, 1983), where serial dependence of the extremes is 
summarized by the extremal index. For more details about extreme-value statistics, see Coles 
(2001), Beirlant et al. (2004), Embrechts et al. (1997) or de Haan and Ferreira (2006). As the tail 
may be well approximated by a GPD, the distribution F of X can be consistently estimated by 



where F(x) is the empirical distribution function of the sample X%, . . . ,X n , ( u is the estimated 
probability of exceeding the threshold u and £ and a are estimates of £ and a. The transformation 
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t(x) = — 1/ logF(x) therefore approximately standardizes the observations to have the unit Frechet 
distribution exp(— 1/x) for x > 0. 

Joint modelling of extremes is crucial for a realistic assessment of risk, and the next section 
describes models for spatial or spatio-temporal extremes, where the margins have been previously 
transformed to the unit Frechet scale. 

2.2 Max-stable processes 

A spatial random process Z(x) defined for x G X C M. d and with unit Frechet margins is said 
to be max-stable if for any finite set T> C X and any function z(x) defined on T>, the following 
property is satisfied: 

Pr{Z(x)/n < z(x),x G V} n = Pr {Z(x) < z(x),x G V} , n= 1,2,.... 

As the class of GEV distributions coincides with that of univariate max-stable distributions, the 
marginal distributions of a max-stable process must be GEV, and therefore, if {Yi(x) : x G X C 
IR d }, i — 1, 2, . . ., are independent and identically distributed replications of a random process with 
arbitrary margins, and if there exist sequences of continuous functions {a n (x)} > and {b n (x)} 
such that 

a~ 1 {max(Y 1 ,...,Y n ) -b n } ^ Z*, n ->> oo, 

where Z* is a non-degenerate random field, then Z* must be max-stable with GEV margins. 
Consequently, the only possible non-degenerate limits for properly linearly renormalized maxima 
of random processes are max-stable processes, which are therefore asymptotically justified models 
for spatial extremes, de Haan (1984) proved that a process Z with unit Frechet margins is max- 
stable if and only if it can be represented as 

Z(x) = snp ^Wi(x), (1) 

i>l 

where the £j's are the points of a Poisson process on R + with intensity £~ 2 d£ and where the Wj's 
are independent replicates of a non-negative random process W(x) with mean 1. We can think of 
the W{S as random storms in space and of the £j's as their intensities. Due to the characterization 
(1), no finite parametrization exists for max-stable processes. 

From (1), it can be straightforwardly shown that the joint distribution of the process Z at N 
distinct locations is 

Pr{Z(xi) < Zi, . . . , Z(x N ) < z N } = exp ^— E 

(2) 



max 

i=l,...,N 



W{ 



exp{-V N (z!,...,z N )}, 
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where the exponent measure Vjv(-), which summarises the extremal dependence structure, is ho- 
mogeneous of order —1 and satisfies Vjv(oo, oo) = 1/z for any permutation of the 
iV arguments. When Zj = z for all i = 1,...,N, we obtain Pt{Z(x\) < z,...,Z(xn) < 
z] = exp{— Vjv(l, . . . , l)/z} = {exp(— l/^)}*^ 1 '-' 1 ), so the so-called extremal coefficient On = 
Vjv(1, . . . , 1) can be seen as a summary of extremal dependence, and has two bounding cases: 
complete dependence, On = 1, and asymptotic independence, On = N. 

Different choices for W(x) yield different models for spatial maxima, with more or less flexible 
dependence structures. For our purpose, i.e., the modelling of extreme rainfall (see §6), the model 
proposed by Davison and Gholamrezaee (2012) and originally due to Schlather (2002), seems 
suitable. They consider a truncated Gaussian random process for W(x), so that storm shapes are 
stochastic, and include a compact random set, which allows one to capture complete independence 
of the extremes. The model is defined by taking 



Wi(x) oc max{0,£j(x)}/ Bi (x - Xi 



x e x, 



(3) 



where X is compact in IR d and the £j are independent replicates of a Gaussian random field with 
correlation function p(h), Ig is the indicator function of a compact random set B C X, the Hj 
are independent replications of B, and the Xi are points of a Poisson process of unit rate on X, 
independent of the £j. The proportionality constant in (3) is chosen to satisfy E{Wi(x)} = 1. 

A common feature of the max-stable models thus far proposed is that the exponent measure 
Vjy is known for N = 2. Genton et al. (2011) provide a closed-form expression of the likelihood 
function for the Smith max-stable model indexed by M> d at iV < d + 1 sites (d > 1), but typically 
only the bivariate margins are known. Moreover the number of terms involved in the likelihood 
increases at a combinatorial rate as iV increases. Therefore, standard likelihood-based inference 
seems to be out of reach. Following Davison and Gholamrezaee (2012), a pairwise likelihood 
approach is considered (see Section 3). The bivariate exponent measure for the model with (3) 
can be expressed in the stationary case as 



V 2 (z 1 ,z 2 ) 




1 - 



a{h) 



1 - 



1-2 



where h = x 1 -x 2 , a(h) = E{\B n (h + B)\}/E{\B\) and 
Hence the pairwise extremal coefficients are 



{p(h) + l}z lZ2 
(zt + z 2 f 




(4) 



is used to denote the volume of a set. 



2 {h) = V 2 {lA) = 2-a{h){\ 



1 - p(h) 



(5) 



As mentioned in Abrahamsen (1997, p. 38), a valid isotropic correlation function p(h) in M 2 satisfies 
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p(h) > —0.403. Therefore, if there were no random set B, i.e., B = X and a(h) = 1, 62(h) would 
be bounded above by 1.838 and complete independence could not be captured by this model even 
at very large distances. With this model, and since B is chosen to be compact, for modelling 
purposes we can choose B so that a(h) — > and thus 62(h) — > 2 as h — > 00 for any correlation 
function p(h). This model is built from random sets with a Schlather model inside, so the short- 
range dependence is largely determined by the correlation p(h), while the longer-range dependence 
is regulated by the geometry of the random set B. There are clearly other possibilities for the 
model inside the random set, but for concreteness we consider just one here. 

In a more general framework, the correlation function need be neither isotropic nor stationary, 
and could therefore depend on the spatial locations x\ and X2 rather than on their distance 
and lag vector h. We would then have non-stationary extremal coefficients. 

In the context of modelling space-time extremes the points x G X have coordinates in space 
S = M 2 and time Tel, that is, x = (s, t) 6 X = S X T . The function p must therefore be a valid 
space-time correlation function (Gneiting, 2002; Cressie and Huang, 1999; Davis et ai, 2011). 

In the following section, we show how to make the link from the asymptotic distribution for 
maxima to a joint model for the right tail. 



2.3 Censored threshold-based likelihood 

The convergence of block maxima to a max-stable process implies that all finite-dimensional dis- 
tributions converge to a max-stable distribution, i.e., to a multivariate extreme value distribution. 
Let {Y n (x) : x G X C lR d }, n = 1,2,... be independent and identically distributed replicates 
of a process Y(x) with Frechet margins. As explained in Section 2.2, the joint distribution of 
properly scaled block maxima at N sites in X is well approximated by exp{— Vn(z\, . . . ,Zn)}, 
where the exponent measure Vjv stems from the underlying spatial structure of the max-stable 
process. Hence, for a large fixed n, the joint distribution at iV locations is 

Pv{Y(x 1 )<z 1 ,...,Y(x N )<z N } = ([Pv{Y(x 1 )<z h ...,Y(x N )<z N }] n ) 1/n 

= Pr < max Yi(xi) < zi, . . . , max Yi(xn) < 

I i=l,...,n i=l,...,n 
f T/ (Zl Z N \^/n 

= exp <^ V N (—, — ) \ 

[ n \n n J J 

= exp{-V N (z 1 ,...,z N )}, (6) 

the last equality coming from the homogeneity of V/v- Hence, the model for maxima in equation (2) 
also provides a model for rare events of individual observations. This approximation is only good 
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for large positive values z±, . . . , z^ E R, since the impact of the approximation is negligible when 
Pr{y (xi) < Zx, . . . , Y(xn) < z^} is close to 1. Hence, the bivariate joint density of the process Y 
at locations x\ and X2 has the form d 2 exp {— V 2 (zi, Z2)} /dz\dz2 for large z\, z 2 . However, as this 
model is only valid when the two events are simultaneously extreme at both locations, we adopt 
a censored likelihood approach (see Coles, 2001, p. 155). Let the threshold u be sufficiently high 
that equation (6) is a valid model for Zi, Z2 > u. Then the likelihood contribution p u (zi, z 2 ) of a 
pair (-21,-22) is 



Pu(zi,Z 2 ) 



exp-f-V^i, z 2 )}, z l7 z 2 > u; 
^exjp{-V 2 (zi,u)}, 
^exp{-V 2 (u, z 2 )}, 
^ exp{-V 2 (u,u)}, 



Z\ > u, z 2 < u; 
z\ < u,z 2 > u; 
Zi, z 2 < u. 



(7) 



Different marginal thresholds could be used (Bortot et ai, 2000) and the approach could be 
generalized to higher dimensions. However, in practice, the probability that an observed iV-uplet 
falls into the "upper right quadrant" decays geometrically with N, leading to inference problems. In 
the next section, we will show that these censored threshold-based pairwise likelihood contributions 
provide consistent inference. 



3 Inference 

3.1 Pairwise likelihood approach 

As the full likelihood is not known for max-stable models, classical frequentist or Bayesian inference 
is impossible, and we adopt an alternative approach based on composite likelihood. An analogous 
approach in the Bayesian framework using a pseudo-posterior distribution based on a pairwise 
likelihood has been developed by Ribatet et al. (2011). Maximum composite likelihood estimators 
typically have similar asymptotic properties to the usual maximum likelihood estimator; often 
they are asymptotically normal and strongly consistent. 

Assume that the spatio-temporal process Z(x), x = (s,t) G X = S x T is observed at S 
monitoring stations and at times 1, . . . ,T, that is at iV = ST locations in X. Let z Stt denote the 
observation recorded at station s and time t, and consider the censored threshold-based pairwise 
likelihood 

T S S 

l M = J2J2J2J2( 1 ~ J { si ^ 52 and h = °» lo §^ Cw> z *w> °) > ( 8 ) 

t=l heKt si=l S2=l 
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with the corresponding maximum pairwise likelihood estimator 

9 PjK = axg max 1^(9), (9) 

where K. t = {h G K. : h < T — t} and /C C N U {0} is a finite collection of time lags, where p u is 
given by equation (7), the exponent measure V being given for example by (4) and where /{•} is 
the indicator function. If /C = {0, 1, . . . , K} for K < oo, this pairwise likelihood corresponds to 
summing up all space-time pairwise contributions, up to a maximum time lag K. If K = T — 1, 
it reduces to the full pairwise likelihood. However, the associated computational burden could be 
reduced and statistical efficiency gained by taking a different subset /C. For example, we could 
take {|_ afc-1 J '■ k — 1, . . . , K} U {0}, a > 1. In particular, when a = 2, we include the pairs at lag 
0, 1, 2, 4, 8, . . .. Another choice could be based on the Fibonacci sequence: 0, 1, 2, 3, 5, 8, 13, . . .. In 
Section 4, we will see that the choice of pairs is closely linked to the efficiency of 9 Pi jc and thus, a 
careful selection of them is essential. 

3.2 Asymptotics 

Davison and Gholamrezaee (2012) and Padoan et al. (2010) use pairwise likelihood for inference 
on max-stable processes, assuming independence between distinct annual maxima. In the case of 
spatio-temporal extremes, the asymptotic normality of 9 V ^ stems from a central limit theorem 
for stationary time series applied to the score U(6) = VZ(0) = J2 t =i u t(8), where U t {9) is the 
derivative of rightmost triple sums in equation (8) with respect to 9. However, as the elements 
Ut(9) are generally correlated over time t, we need an additional mixing condition in order for 
classical asymptotics to hold. A suitable mild sufficient condition is that the process Z(x) be 
temporally a-mixing, along with a condition on the rate at which the mixing coefficients a(n) 
must decay, ensuring that the correlation vanishes sufficiently fast at infinity. With this condition, 
two events become more and more independent as their time lag increases. In particular, all 
m-dependent processes are contained within the class of a-mixing processes. 

We call a space-time process Z(x),x = {s,t) G X = SxT temporally a-mixing with coefficients 
a(n) if for all s G S, for all sequences t n C T, the time series {Z(s,t n ),n G N} is a-mixing with 
coefficients a s (n) and where sup sgiS a s (n) < a(n) — > as n — > oo. For the definition of an a- 
mixing time series, see Bradley (2007, Definition 1.6). We can then obtain the following theorem, 
whose proof, which relies on the theory of estimating equations, is given in Appendix A. 

Theorem 1. Assume that Z(x) is a stationary spatio-temporal max-stable process which is tem- 
porally a-mixing with coefficients a(n). Moreover, suppose that for all 9 G 0, E[{[/i(^)} 2 ] < oo 
and that for some S > 0, one has E(|L r 1 (^)| 2+5 ) < oo and ^ n>1 |a(n)| 5/(2+(5) < oo. Then, if 9 is 
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identifiable from the bivariate densities, then 



T yi K{ e)-y*j x {e){e p , K - o) A Af(o, i p! . 



where 



J x {9) = E{-V fl C/i(0)}; (10) 



T 



k(6) = t-\sx r£2u t (e) 

= EiUiieWOf} + [^{Ui(9)Ut + i(0) T } + E {U t+1 (9)Ut(0) T }} (11) 

t=i ^ ' 

oo 

E{[/ 1 (^[/ 1 (0) r } + [E {C/iWC/wW'} + E {tW0)£M0) T }] < oo, I 1 oc. 



t=i 



This result shows that the standard asymptotic normality result for composite likelihoods 
(Hjort and Varin, 2008; Lindsay, 1988; Godambe and Heyde, 1987; Varin, 2008; Varin and Vidoni, 
2005; Cox and Reid, 2004; Padoan et at, 2010) still holds under mild conditions for moderately 
temporally dependent processes. Furthermore, the asymptotic variance turns out to be of "sand- 
wich" form, as is standard for misspecified models. 

If the process Z(x) were instead assumed to be Gaussian, and hence not max-stable, and if the 
pairwise likelihood were defined in terms of the marginal bivariate normal densities, then the mo- 
ment conditions of the theorem, i.e., E[{{7i(#)} 2 ] < oo and E(|L r i(#)| 2+<5 ) < oo, would be automati- 
cally satisfied for all 5 > 0, and thus the mixing condition would reduce to Yl n >i l a (^)| 1_e < °°> f° r 
some e > 0. Similar results were obtained by Davis and Yau (2011), who establish the asymptotic 
normality and the strong consistency of the maximum consecutive pairwise likelihood estimator 
for ARMA models, under a condition on the autocorrelation function. They also treat certain 
long-memory models. 



3.3 Variance estimation 

Variance estimation for 9 p jc is difficult due to the complicated form of the sandwich matrices in 
equations (10) and (11). The pairwise log likelihood is formed by summing the pairwise contribu- 
tions for the time lags in the set /C and across all S stations, so a single evaluation of the pairwise 
log likelihood requires 0(T|/C|5 2 ) operations, and the computation of (11) is still more intensive. 

The temporal dependence of the data suggests that block bootstrap or jackknife methods be 
used. For computational reasons, in our application we choose to apply a block jackknife, treating 
rainfall data from different summers as independent. For that purpose, we leave out yearly blocks 
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one at a time, and get pseudo- values of 9 P to estimate its variability, using the formula of Busing 
et al. (1999). Fortunately, the pseudo- values can be computed in parallel. 

4 Efficiency considerations 

In Section 3, we introduced our maximum pairwise likelihood estimator for spatio-temporal ex- 
tremes. Although it inherits its asymptotic properties from the traditional maximum likelihood 
estimator, the natural question of statistical efficiency remains to be addressed. It turns out that 
the loss in efficiency is closely related to the pairs that are included in the pairwise likelihood, that 
is to the choice of /C. Adding pairs might simultaneously increase the variability K{9) of the score 
and the amount of information, J(9), so it is unclear how the selection of pairs acts on the variance 
T^ 1 J(8)~ 1 K(9)J(6)~ 1 ; the amount of information contained in a single pair might be insufficient 
to counteract the increase of variability due to including it, so the choice of the optimal subset of 
pairs is not obvious. However, one might suspect that for short-range dependent processes, the 
pairs that are far apart in5xT are not as relevant for the estimation of a dependence parameter 
as are the close ones. Varin et al. (2011), Varin and Vidoni (2005) and Varin and Czado (2010) 
already suggested the elimination of non-neighbouring pairs. 

We studied the efficiency of the maximum pairwise likelihood estimator for time series models 
whose maximum likelihood estimators could be computed, hoping to gain a qualitative under- 
standing of how composite likelihoods behave in more complex settings. In the same vein as Davis 
and Yau (2011), we studied AR(1) and MA(1) processes, but with the different objective of under- 
standing how the asymptotic relative efficiency evolves as the set of time lags K, for the selection 
of pairs in the likelihood varies. Complementary results on the efficiency of pairwise likelihood 
may be found in Cox and Reid (2004), Varin and Vidoni (2009), Hjort and Varin (2008) or Joe 
and Lee (2009). 

Figure 1 displays the asymptotic relative efficiency (ARE) of the pairwise likelihood estimator 
with respect to the maximum likelihood estimator, that is avar(#MLE)/ var ($p,,/e)) f° r different sets 
K. of time lags. We considered (a) K,^ = {1, . . . , K}, for which all time lags are used up to some 
maximum time lag K; (b) ICff = {b^ : k = 1, . . . , K} where b^ is based on the Fibonacci sequence; 
and (c) K.f = {2 fc_1 : k — 1, . . . , K} for which the lags increase exponentially. Since the efficiency 
curves were found to be qualitatively similar for (b) and (c), we only present the results for K.^ and 
K.f. The left-hand column of Figure 1 displays the ARE for the AR(1) process, and the results for 
the MA(1) process are shown in the right-hand column. The top row shows the efficiency curves 
for /Cf" and the bottom row considers the set /C^ . As mentioned by Davis and Yau (2011), the 
efficiency is maximized when pairs at lag 1 only are included. 

In the top left panel (AR(1) and /C^), the ARE for the dependence parameter A is 100% when 
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K, = {1} and then decreases sharply before stabilizing at about lag 9. This shape is reproduced 
qualitatively in the bottom left panel, when only the pairs at lags 2 k are taken into account, but 
the efficiency stabilizes at a higher level. However, in practice, one might need to include more 
distant pairs to ensure parameter identifiability When the pairs at lags 1, 2, 3, 4, 5, 6 are included, 
the efficiency of the estimator, around 70%, is significantly lower than when the pairs at lags 
1, 2, 4, 8, 16, 32 are included. Therefore, for a fixed number of pairs, here 6, it is advantageous to 
include some distant pairs as well. Thus, for the AR(1), it is better to include not only strongly 
dependent pairs, but also weakly dependent ones. 

The results for the MA(1) process suggest that the efficiency is little affected either by the 
selection of pairs or by the number of time lags considered, but the ARE is extremely low for 
the dependence parameter A. Other results (not shown) reveal that the efficiency for A drops 
dramatically as A approaches ±1, so the loss in ARE is substantial even for moderately correlated 
MA(1) processes. 

When max-stable processes are considered, these results can only be treated as analogies. 
However, it seems that two main conclusions can be drawn: including many pairs in the pairwise 
likelihood can spoil the estimator, suggesting that we should retain as few pairs as possible, pro- 
vided the parameters remain identifiable, and incorporating information from temporally distant 
(or weakly correlated) pairs is valuable when the process is autoregressive. 

5 Simulation study 

The considerations on efficiency discussed in Section 4 being based on simple time series models, it 
is important to check to what extent the conclusions extend to max-stable processes. We therefore 
conducted a simulation study in a one-dimensional framework. We simulated the Schlather model 
(3) on the time axis, taking X = [0,2000], with random sets of the form B = [0,D], where 
D = 245 and 5 ~ beta(10, 240/^ - 10) with fi = 40/3, so E(D) = // ~ 13.3. We chose an 
exponential correlation for the underlying Gaussian random field e, with range parameter A = 4; 
the effective range is 12. These parameters were chosen to mimic rainfall data. The top panel of 
Figure 2 displays a realization from this model. 

Fixing the parameter \x of the random set to its true value, we then estimated the logarithm 
of the range parameter, log A, with the threshold-based pairwise likelihood estimator of equation 
(9). We tested different estimators corresponding to the three sets of time lags used in Section 4, 
namely JC^, JCff and K.^, for K — 1,6, 9. Table 1 reports the mean squared errors (MSE) of these 
estimates based on 300 realizations of the Schlather model. 

The MSE is minimized for /C = {1}, corroborating the findings of Section 4 for AR(1) or 
MA(1) processes. Moreover, the MSE is 13% lower when /C® is used instead of K,\ and 24% lower 
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Table 1: Mean squared errors (MSE) of the estimates of log A, the logarithm of the correlation range 
parameter, based on 300 replications of the Schlather model, for different sets of pairs included 
in the pairwise likelihood. 



Number of time lags K 


1 


3 


6 


9 


Type of time lags set /C 


K K 

^a/b/c 




/Cf 




Kf 


/Cf 




£f 


/Cf 


MSE 


0.100 


0.111 


0.115 


0.145 


0.132 


0.126 


0.180 


0.144 


0.136 



Table 2: Mean squared errors (MSE) and percentages of successful maximizations of the pairwise likeli- 
hood for the joint estimation of the mean duration \i of the random set and the logarithm of the 
range parameter, log A, when different sets of pairs are included in the pairwise likelihood. This 
simulation is based on 300 replications of the Schlather model. 



Number of time lags K 


1 


3 


6 


9 


Type of time lags set /C 


^a/b/c 


K K 

^a/b 


/Cf 




K 


/Cf 




/Cf 


/Cf 


MSE for log A 


0.148 


0.129 


0.113 


0.123 


0.111 


0.105 


0.133 


0.129 


0.119 


MSE for 


34.8 


21.5 


20.2 


14.9 


11.3 


14.0 


12.6 


10.1 


12.6 


Successful convergence (%) 


59.3 


70.7 


76.0 


82.7 


95.7 


97.3 


89.3 


92.7 


94.7 



when K? c is used instead of K? al even though the observations separated by more than 24 time 
units were independent. Thus the inclusion of some distant, less dependent, pairs can improve 
inference significantly for fixed K. 

The bottom panel of Figure 2 shows that the bias becomes less and less visible and the variance 
decays more and more as the number of observations T increases, confirming the theoretical results 
established in Section 3. The simulation suggests that we can estimate the dependence parameter 
consistently, as expected. 

Joint estimation of the correlation range parameter A > and the mean duration fi G (0, 24) 
of the random set is more difficult. Sometimes the estimate of fi reaches its upper bound; the 
percentage of successful convergence of the algorithm reported in Table 2 is only 59% when we 
choose /C = {1}, while it is respectively 83%, 96% and 97% for /C„, JCf and K,®. The estimators 
including distant pairs in the likelihood outperform those that do not or that only use the most 
dependent pairs. The same phenomenon is observed when K = 9, but the difference is less 
striking than for K = 6, as expected. In fact, the pairs at lags less than 6 are probably ineffective 
to estimate the duration of sets that last 13.3 time units on average, and that is why /Cf or /Cf 
are better choices than /Cf . The set of time lags /Cf seems slightly better than /Cf , in terms 
of percentage of successful maximizations of the pairwise likelihood. As far as MSE values are 
concerned, it again seems that the estimators including distant pairs outperform those that use 
only nearby pairs. Moreover, it seems that the sets of the form /Cf now have slightly smaller MSEs 
than /Cf . To sum up, both estimators that include pairs at lags in /Cf or /Cf behave appreciably 
better than /Cf , for fixed K. 
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6 Data analysis 



6.1 Description of the dataset 

The dataset used for our application is composed of hourly rainfall measurements (mm) recorded 
from 1981 to 2007 at ten monitoring stations in western Switzerland. Figure 3 illustrates the 
location and topography of the area of study. All stations are located between the Alps and 
the Jura mountains, and their altitude is almost constant. Only the periods from midnight on 
June 21st to 11 pm on September 20th were considered, summers being treated as mutually 
independent. The entire dataset comprises 503988 measurements, with up to 59616 data points 
per site. The rainfall time series, shown in Figure 4, were independently transformed to the unit 
Frechet scale, following Section 2.1, with quantile-quantile plots showing satisfactory agreement 
between the empirical and fitted quantiles. The thresholds were the 0.97-quantiles of each series. 
Due to the size of the dataset at each site, the margins were fitted with negligible variability. 
Below we focus on the modelling of extremal dependence, rather than on the marginal behaviour. 

Figure 5 gives an overview of the empirical pairwise extremal coefficients for all pairs of stations 
at different time lags, based on a censored version of the naive Schlather-Tawn (2003) estimator. 
There is evidence of significant spatial and temporal dependence between the different series. 
Panel (1, 1) shows the temporal extremal coefficients at Bern-Zollikofen; it starts with the value 
1 (complete dependence at lag 0), and seems to tend smoothly to the value 2 (independence) as 
the time lag increases. This pattern repeats itself for the other sites. The off-diagonal panels 
represent extremal coefficients for the different pairs of stations, and hence display space-time 
interactions. For example, Panel (1,4), in the 1st row and 4th column, displays the extremal 
coefficients between the rainfall time series at Luzern at time t and the rainfall time series at 
Bern-Zollikofen at time t + h, for h = 0,1, . . .,24. Panel (4, 1) reverses the role of the stations. The 
extremal coefficient functions differ for the panels, showing that the orientation of the stations 
matters. The extremal coefficient dips towards the value 1 at lags 1 or 2 when the stations are 
west-east oriented: during the summer months, western Switzerland is governed by dominant 
winds from the west or north-west, so that the clouds tend to discharge their rain first on the 
western part of Switzerland. The same rainfall event could therefore be recorded by two distant 
monitoring stations at a lag of 1 or 2 hours, depending on their location and on the wind velocity. 
Consequently, extremal dependence might be higher at lag 1 or 2 than at lag 0. A model for the 
data should be able to capture such features. 
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6.2 Model construction 



We now discuss the construction of a model based on (3) for the rainfall data described in Sec- 
tion 6.1. This space-time model comprises a standard Gaussian random field e(x) with correlation 
function p(h) and a random set element B, both defined on a space X = S x T, where S = IR 2 
denotes space and T = R + denotes time. 

The Gaussian random field is supposed to model the short-range behaviour of the process 
within single storms, so it is important to have a correlation function that can flexibly capture 
space-time interactions. For a good review of space-time correlation functions and a discussion 
of properties such as stationarity, separability and full symmetry, see Gneiting et al. (2007) and 
the references therein. Cressie and Huang (1999) propose classes of nonseparable spatio-temporal 
stationary covariance functions based on Bochner's theorem, and Gneiting (2002) extends their 
work by providing other very general flexible space-time covariance models. Davis et al. (2011) 
show that this class of covariance functions satisfies a natural smoothness property at the origin, 
directly linked to the smoothness of the random field, and is therefore suitable for the modelling 
of physical processes such as rainfall. We used the isotropic nonseparable space-time correlation 
function (Gneiting, 2002) 



where s and t are respectively distances in space and time, a s , at t £ IR determine spatial and 
temporal scale parameters, /3 a , £ (0,2) are spatial and temporal shape parameters, d = 2 
is the spatial dimension, and 7 £ (0,1) is a separability parameter quantifying the space-time 
interactions. As 7 approaches 1, the spatial and temporal components are less and less separable. 

The random set B is interpreted as a random storm having a finite extent, which enables the 
model to capture complete independence. Conceptualizing storms as disks of random radius R 
moving at a random velocity V for a random duration D starting from a random position, the 
storm extent B in space and time becomes a tilted cylinder in S x T, with a Schlather process 
inside; see Figure 6. For tractability we assume that R ~ Gamma {mn/kn, &r) (with mean 
km), V ~ J\f 2 (my , ft) (km/hour) and D ~ Gammafmc/fci), ko) (with mean mu hours). 



The fitting of our model requires the computation of the coefficient a(h) = E{\Br\(h + B)\}/E(\B\) 
for h £ X, i.e., the normalized expected volume of overlap between the random set B and itself 




(12) 



6.3 Model fitting 
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Table 3: Parameter estimates and standard errors from fitting our random set model to the rainfall data. 



Estimate (SE) 


correlation bcale 


Space 


exp(ct s ) (km) 


Q £ £ ( K Ci h 7\ 

oo.o [o.y 1 j 




Time 


exp(a 4 ) (hr) 


1 MM { C\ 1 A \ 

l.UU l^U.MJ 


Shape 


Space 

± llilc 


Pt 


U.yo ^U.Uo ) 
1 I ) 


Separability 




7 


0.99 (0.00) 


Random Set Duration 


Mean 


mn (hr) 


36.78 (0.34) 




Shape 


k D 


9.75 (0.01) 


Radius 


Mean 


m R (km) 


51.21 (0.16) 




Shape 


k R 


0.28 (0.05) 


Velocity 


Mean 


my (km/hr) 


32.67 (0.74) 






Q{{ 2 (km/hr) 


11.41 (0.16) 




Standard Deviations 


3.00 (0.01) 




Correlation 


nH 2 (km/hr) 

Pl2 


3.43 (0.00) 
-0.95 (0.03) 



shifted by the space-time lag h. Several mild approximations, some analytical calculations and a 
single one-dimensional finite integration yield a good approximation to a(h); see Appendix B. 

After some exploratory analysis we fixed (3 t = I, and then the model has four parameters for the 
correlation function, and nine for the random set. Due to the complexity of the problem, we split 
the estimation procedure into four parts: we estimate first the temporal parameters (a t , kjj), 
then the spatial parameters (a s , /3 a , trr, kn), then the spatio-temporal parameters (7, my, Q) with 
the other parameters held fixed to their estimates, and finally all the parameters together, with the 
former estimates as starting values. We always use the pairwise likelihood estimator (9). Standard 
errors are calculated by the block jackknife (see Section 3.3), using yearly blocks. Based on the 
results in Sections 4 and 5, we include the pairs at lags in K, = {0, 1,2,4,8, 16} in the pairwise 
likelihood. A single evaluation involves contributions for about T|/C||5| 2 = 50000 x 6 x 10 2 = 30 
million pairs, while the full pairwise likelihood would have 7 billion pairs, completely impractical 
for inference purposes! We coded the pairwise likelihood in C, parallelized the work on 8 CPUs, 
and fitted the model using the R optimization routine L-BFGS-B with specific box constraints. 
Due to the complex model and the amount of data, a single full estimation took 5 days. As the 27 
bootstrap replicates can be computed independently, 27 x 8 = 216 CPUs were used simultaneously 
to estimate the standard errors. The results are presented in Table 3. 

The estimated mean duration and mean radius m# of a storm are 37 hours and 51 km, 
which seem reasonable when compared to radar images of precipitation for the same region and 
time of year. The mean velocity my has components 33 and 11 km/hr, so the mean speed is about 
34.6 km/hr and the angle of the dominant winds is about 19° in the Argand diagram. This means 
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that the clouds are likely to move from west to east (and slightly to the north), in agreement with 
the dynamic of the oceanic climate in Western Switzerland during the summer. The correlation 
pi2 of the velocity is close to —1, so the angle of the velocity is less easily determined than its 
length. 

The separability parameter 7 always reached the bound 0.99, suggesting that the data are 
highly nonseparable and that this parameter tries to capture it. 

Overall the standard errors seem very small despite the large amount of data available. The use 
of monthly (instead of yearly) blocks produced similar standard errors, but this could be due to the 
inappropriateness of the bootstrap procedure or the instability of the pairwise likelihood around 
the maximum pairwise likelihood estimate. For computational reasons we could not investigate 
this further, so the standard errors should be interpreted with care. 

6.4 Model checking 

Figure 5 compares empirical estimates of the pairwise extremal coefficients with their model-based 
counterparts. There is a good agreement overall, but the model systematically underestimates 
extremal dependence at lag 1. This lack of fit at short time lags can be explained either by a 
lack of flexibility due to the (conceptually) simplistic model that we used or by the difficulty to 
reach the global pairwise likelihood maximum for such a model. The diagonal plots, showing the 
marginal dependence of the extremes, show a good fit. The small differences at Cham (CHZ) 
or Mathod (MAH) may be due to nonstationarity or because data at those monitoring stations 
might not have been cleaned properly; see Figure 4. The left panel of Figure 7 shows the pairwise 
extremal coefficients 82 in (5). 

As the model was fitted with pairs of observations, one might wonder whether it can cap- 
ture higher-order interactions. We therefore computed the trivariate extremal coefficients (see 
Appendix C) and found good agreement between nonparametric estimates of trivariate extremal 
coefficients and their model-based counterparts; see Figure 7. Despite the strong dependence 
among these trivariate estimates and with the pairwise counterparts, it seems that the trivariate 
interactions are fairly well modelled. The biggest discrepancies are from stations CHZ (Cham) and 
MAH (Mathod), but without these stations, the points are well concentrated around the diagonal. 

In order to assess the sensitivity of the results to initial conditions, we re-fitted the model with 
different starting values. The results were sometimes fairly different, but with similar bivariate 
properties and with almost the same likelihood. Consequently, we believe that some parameters 
are likely to play a similar role, giving rise to identifiability issues. Indeed, our stations are at 
most 150 km apart, which is not very distant, if a cloud of radius 50 km moves at about 35 km/hr. 
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7 Discussion 



The work described above extends existing statistics of extremes by proposing a flexible class of 
models for spatio-temporal extremes, applied here to rainfall, but with clear possibilities for exten- 
sion to other phenomena. 'Dynamic' space-time modelling of extremes thus seems to be feasible; 
complex models can be consistently fitted using composite censored likelihood based on threshold 
exceedances. However, the large amount of data and the consequent use of parallel computation 
underlines the necessity for substantial computing resources when tackling such problems: in our 
application, the fitting would otherwise have been completely out of reach. 

Although highly idealized, our model is still fairly complex, and estimation and simulation are 
demanding. Moreover, the assessment of fit is tricky, due to the computational burden that it 
requires. After a major effort we were able to check the trivariate interactions by means of the 
3rd order extremal coefficients, and although it would be feasible to use simulation to investigate 
higher order interactions, it would be awkward. 

An important modelling issue is that near-independence cannot be captured by our model, 
which is based solely on max-stable processes. However, it is common in practice to observe 
two distinct events becoming less and less dependent as their rarity increases. Wadsworth and 
Tawn (2012) have proposed models that can handle both asymptotic independence and asymptotic 
dependence, and it seems entirely feasible to extend our approach to them. 
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Appendix 

A Proof of Theorem 1 

Proof. For notational simplicity, we give the proof in the case where the parameter 9 is scalar, but the 
argument can be extended to the vector case. 

By definition of the pairwise likelihood in equation (8), and as the observations z s j are realizations of 
a max-stable process, we have 

s s 

z S2 j +h ;9) Ml - 7{si > s 2 and h = 0}). 



fcG/C 4 s 2 =l si=lv__; 



=0 

Therefore, we also have that E{J7(0)} = E{£^L X U t (9)} = 0. 

In addition, the variance of U(9) renormalised by T is (Shumway and Stoffer, 2004, p. 510) 

X— 1 

T-\ a r{U{9)} = E{Ux(9) 2 } + 2 £ (l - f\ V{U x {9)U t+1 {9)} 

t=i \ ' 

oo 

-> E{[/i(6>) 2 } + 2 ^ E{Z7i(e)t/i+i(e)}, as T — » oo, if the sum converges absolutely. 
t=l 

Now, as 9 Py )c is the maximum pairwise likelihood estimator, second-order Taylor expansion of Ut(9 Pt /c) 
around the true parameter 9 gives 

rp rp 

o = E = E { + ^vtiPWp* -9)}, 



which gives, up to a term of the order 0{(9 p ^ — 9) 2 }, that 

- 1 



{T ~\~^T 
E^)f 52u t (6) = + HiO)- 1 !/^), 
t=i J t=i 



(13) 



where Ht{9) = —dUt{9) / d9 and 77(0) = Ylt=iHt(9) is the observed information. Moreover, since the 
process Z{x) is assumed to be temporally a-mixing with coefficients a(n), the time series Ut(9) is also 
a-mixing with coefficients a'{n) = a(n — max/C). Hence, 

a'(n) -)■ 0, |a'(n)| 5 /( 2+<5 ) < oo, 

n>l 

with the same 5 > 0. These results, along with the assumptions E(Uf) < oo and E(|[/i| 2+<5 ) < oo, ensure 
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that the Central Limit Theorem 10.7 of Bradley (2007) applies, and thus 



T- l ' 2 U{6) A AA{0, K{9)}, T -> x . 



where K(9) = E{[/i(#) 2 } + 2 ^{U\(9)Ut+i(6)} < oo and denotes convergence in distribution. 

Therefore, coming back to equation (13), and by definition of J\{9), by the law of large numbers, and by 
Slutsky's theorem, we get 

T 1/2 (9 Py/c -9) = T 1 / 2 H{9)~ 1 U{9) 

= {T- l H{9)Y 1 {T- 1 ' 2 U{9)} 

A J 1 (9)- 1 M{0,K{9)} asT^oo 

= Ar(0,J 1 (9r 1 K(9)J 1 (9r 1 ), 

where = denotes equality in distribution. But K(9) is the asymptotic variance of the score, renormalized 
by T. Hence, the result is proved. □ 



B Computation of the volume of overlap a(h) 

The coefficient a(h) is defined as E{|/3 D (h + jB)|}/E(|S|), where B is a tilted cylinder in X = S x T = 
R 2 x R_|_ (see Figure 6), and h = (s,t) G X. If the cylinder were vertical (zero wind velocity), the volume 
of overlap would simply be the product of the area of overlap between two discs distant by ||s|| and the 
corresponding height, the storm duration minus t. 

Let R be the storm radius, V = (Vj.,1^) G M 2 be its velocity and D be its lifetime. A good linear 
approximation to the area of overlap of two discs of radius R distant by d is ttR 2 max{0, 1 — d/(2R)} 
(Davison and Gholamrezaee, 2012). Therefore, for a vertical cylinder B, \Bf](h + B)\ can be approximated 

by 

where a+ = max{0,a}. When the cloud is moving, giving a tilted cylinder, a simple geometric argument 
can be used to prove that in the general case, the volume of overlap is transformed to 

\Bn(h + B)\=nR 2 (l-^) (D~t) + , 

where d* = [\\s\\ 2 + t 2 (V? + V 2 2 ) - 2||a||t{Vi cob(0) + V 2 sin(fl)}] 1/2 , = arctan(si/s 2 ) being the angle 
between the stations with respect to a reference axis in the West-East direction. In order to compute the 
coefficient ct(h), which depends upon the spatial distance ||s||, the temporal lag t and the orientation of 
the stations 9, we need to obtain the expected volume of overlap E{|£> n (h + B)\}, by putting tractable 
distributions on R, D, and V = (V\, V2). We choose to set 
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R ~ Gamma(mi?/£;R, &r) (with mean tjir km); 
V ~ Af2(my, Q) (km/hour), with my = {m\,m2) T and 0, 



\aia 2 pi2 o\ , 



• Z) ~ Gamma(rafl/fcD, (with mean m£> hours), 

and we assume that R, D and V are mutually independent. To compute this expectation, note first that 
(D — i) + can be integrated out analytically. Second, by conditioning on V, it is possible to integrate over 
R as well. We can then reduce the full computation to this single expectation with respect to V = (Vi, V2): 



a(h) = E V {vT{G mR/kR , kR+2 > cf /2) - 2{k *+* )m / *(G mB/kRikR+1 > cT/2)} , 



(14) 



where Ge- k is a gamma random variable with scale parameter 9 and shape parameter k; its mean equals 
m = 8k. Expression (14) does not have a closed form, but it can be remarkably well approximated by a 
function of the form 

where a is real number which does not depend upon V = (Vj., V2), and that can be estimated with a few 
points by least squares, and where p\ = ||s|| cos(0)/i and /U2 = ||s|| sin(#)/i. Therefore, we have 



a(h) « E y { e -n/W-An) 2 +(VW 2 ) 2 } 



e -oV (fl-ftl) 2 +(«2-M2) 2 e -|(«i-mi;«2-m2)n ^i-mi^-n^) 1, j j.. 

27rdet(fi) 1 /2 



1 



^ / re- ar -« {r2a « )+r6 « )+c(€)} dr (15) 



27rdet(0)V2 
1 

2vrdet(0) 1 /2 J 

1 / 1 ~Mo2(^fy-^^) 2 )^ f r \ e ~2\-^rj dr 



(2vr) 1 / 2 7o Jr + V2^a(t) 



1 /" 27r 1 1 ( „(t\2 



2 Wo y/^) 

where <&(•) is the normal cumulative distribution function and 



(16) 



a(£) = cos 2 (^)<T2 + sin 2 (^)cr 2 - 2 cos(^) sin(^)<ri(T2Pi2 

= 2cos(^)(/xi - mi)cr| + 2sin(£)(/x 2 - m 2 )cJ 2 - 2cos(£)(/i 2 - m 2 )o-icr 2 pi2 - 2sm(£)(/zi - mi)cricr 2 pi 2 

c (0 = (/"1 - "J-i) 2 ^! + (a*2 - "i2) 2 o"? - 2(^i - mi)(// 2 - m 2 )o-icj 2 pi 2 

M(0 = "^"^p = Vdet^VKa det(ft) = a 2 a 2 (l - p 12 ). 

Expression (15) above was computed with a straightforward change of variables v\ = rcos(£) + /xi, 
i> 2 = rsin(£) + ^x 2 , while expression (16) stems from the properties of the normal cumulative function. 
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Since the integral (16) is impossible to handle analytically, we can use a finite approximation to estimate 
a(h), based on 100 points equi-spaced in the interval [0, 2ir]. The approximation seems to be adequate 
when af, a\ > 5, which we impose in the R optimization routine L-BFGS-B. 



C Trivariate extremal coefficients for model (3) 

From equation (2), we know that the multivariate extremal coefficient in dimension N is 



N = V N (1,...,1)=E 



max {VF(xj)} 
i=l,..., N 



This takes values between 1 and N, ranging from complete dependence to asymptotic independence. 
Therefore, the extremal coefficient of order 3 is 

9 3 = E [max{W(xi), W(x 2 ), W(x 3 )}} , 

where, for model (3), W(x) oc max{0, e(x)}/g(x — X), x G X, e{x) being an isotropic Gaussian random 
field with zero mean, variance 1 and correlation function p(h) and Iq being the indicator that the point 
x — X belongs to a random set B (where X is a Poisson process in X). The proportionality constant is 
such that W(x) has mean 1, so it must be 



1 1 V2vr V2ir\X 



E{max(0, e)I B ] E{max(0, e)}E(I B ) Pr(x G B) E{\B\) ' 

Below we write W\ = W(xi), E\ = e(x\), I\ = Ig(xi), Ii ;2; _ = I{%1 G B and x 2 G B and x 3 G' B} and 
so forth. Then the required extremal coefficient is 



6 3 = E{max{W 1 ,W 2 ,W 3 )} 
V2tt 

= 5~7 -^rE {max (0,ei/i, e 2 ^2, £3-^3)} 

Pr(xi G B) 

V2tt \ 

= :r~m E {max (0, £1, e 2 , £3) h-,2-,3} + E {max (0, ei, e 2 ) h-,2 -} 

rr(xi G o) [ 

+E {max (0, £1, e 3 ) h -,3} + E {max (0, e 2 , e 3 ) ^-;2 ;3 } 
+E {max (0, ei) Ji ; - ; -} + E {max (0, e 2 ) I-;2;-} + E {max (0, e 3 ) I-,-,3} 

= Pr(x 2 G B,x 3 G # I xi G B)V2nE {max (0, £1, e 2 , £ 3 )} + Pr(x 2 G x 3 ^ | xi G #)\/27rE {max (0, £1, 
+Pr(x 2 ^ #,x 3 G # I xi G i3)v / 27rE{max(0,ei,e 3 )} + Pr(xi £ B,x 3 £ B \ x 2 £ B)V2nE {max (0, e 2 , e 
+Pr(x 2 g 0, x 3 $ B I xi G tf) + Pr(xi g B, x 3 £ | x 2 G 0) + Pr(x x £ B, x 2 $ B \ x 3 G B). 



The expression V 27rE {max (0, ei,e 2 , £ 3 )} above is merely the trivariate extremal coefficient for the Schlather 
model without random sets, which can be evaluated quickly and accurately by simulation, whereas 
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t/2tt~E {max (0, £j, Ej)} is the bivariate extremal coefficient between station i and station j, which can 
be computed analytically with the exponent measure Vi-j(l, 1). 

The probabilities above correspond to the normalized expected volumes of overlap of three sets B 
centered at x±, x 2 and x 3 . For example, 

Pr(x 2 g B, x 3 g B I x x G B) = E{\B n {B + (x 2 - xi)} n {6 + (s 3 - xi)}|}/E(|B|), 

Pr(x 2 £ B,x 3 ^ B \ xi £ B) = E[\B C\{B + (x 2 - x ± )} n{B + (x 3 - Xl )} c \]/E(\B\). 

For given radius R, duration D and velocity V, the random set is fixed and the volume of overlap 
can be calculated analytically. Simulation can then be used to compute the expectation of such random 
quantities. 

The same approach could be used to compute extremal coefficients at a higher order N, at the price 
of needing to compute by hand all the areas of overlap between N discs with same radius. 
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Figure 1: Asymptotic efficiency of maximum pairwise likelihood estimators relative to the maximum like- 
lihood estimator, as a function of the maximum time lag included in the pairwise likelihood. 
The pairwise likelihood of equation (8) is modified accordingly by setting 5 = 1 and replac- 
ing ip u by the corresponding pairwise density. Top row: K* = {1,...,K}. Bottom row: 
)Cf = {2 k ~ l ; k = 1, . . . , K}. Left column: AR(1) process (Z t - fi) = X(Z t -i - //) + Et, with 
£t ~ A/"(0,cj 2 ), a > 0, fx G K, |A| < 1. Right column: MA(1) process Z t = \i + Et + Aet_i, with 
e t ~ Af(0, a 2 ), a > 0, fi G R, \X\ < 1. The parameters are 6 = 0.6, a 2 = 1, and T = 500. 
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Figure 2: Top: Simulation of the Schlather model at a particular location with beta distributed random 
sets. The correlation is exponential with range parameter A = 4, giving an effective range of 
12. The blue line represents the 0.95-quantile. Bottom: Boxplots (with corresponding mean 
squared errors) of the estimates of log A (based on 300 replications) using pairs at lag 1 only, for 
an increasing number of observations T. The true value is the horizontal red line at log 4 ~ 1.38. 
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Figure 3: Topographic map of Switzerland, showing the location and altitude of the monitoring stations 
used. Their elevations are all close to 500 m above mean sea level (amsl), except for three 
stations (FRE, NAP, PLF) at about 1000 m amsl. The scales of the x and y axes correspond to 
the Swiss coordinate system. The closest stations (FRE, MAH) are 10 km apart and the most 
distant ones (CHZ, MAH) are 151 km apart. 
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Figure 6: Illustration of the random set element B in space S (horizontal plane) and time T (vertical 
axis). The storms are conceptualized as random disks with a random radius moving at a 
random velocity for a random duration. The red tilted cylinder represents a realization B of 
such a storm in 5x7", and the blue one is B + h, for a given vector h. The coefficient a(h) 
needed for the fitting is the expected volume of intersection between the two cylinders. 
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Figure 7: Comparison of empirical estimates of all pairwise (left) and trivariate (right) extremal coeffi- 
cients for the rainfall data with their model-based counterparts. The light-grey vertical lines 
are 95% confidence intervals. A perfect agreement would place all points on the grey diagonal 
line. 
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